Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optional validation loop #6

Open
wants to merge 25 commits into
base: main
Choose a base branch
from

Conversation

TJ-Solergibert
Copy link

In this PR I push a little fix to disable the validation loop. As we discussed in the last meeting, the parser is not able to differentiate between NanosetDatasetsArgs & MultilingualNanosetDatasetsArgs properly.

You can deactivate the validation loop by NOT setting validation_folder or by manually setting val_check_interval = -1.
image

The parser is a bit stupid and won't be able to parse properly as MultilingualNanosetDatasetsArgs when specifying training_folder & validation_folder, so we will never raise this error. Instead, it will raise the typical parser error which looks like this:

dacite.exceptions.UnionMatchError: can not match type "dict" to any type of "data_stages.data.dataset" union: typing.Union[nanotron.config.config.PretrainDatasetsArgs, nanotron.config.config.MultilingualNanosetDatasetsArgs]

Nevertheless, despite raising the parser error instead of the other one complaining about the missing languages the functionality is the desired one.

PD: Why I'm pushing so many commits? Aren't merged yet? Do they come from the future?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants